{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# PubChemPy examples\n",
    "\n",
    "## Table of Contents\n",
    "\n",
    "- [1. Introduction](1-introduction.ipynb)\n",
    "- [2. Getting Started](2-getting-started.ipynb)\n",
    "\n",
    "# 2. Getting Started\n",
    "\n",
    "## Retrieving a Compound\n",
    "\n",
    "Retrieving information about a specific Compound in the PubChem database is simple.\n",
    "\n",
    "Begin by importing PubChemPy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import pubchempy as pcp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let’s get the Compound with [CID 5090](https://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5090):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Compound(5090)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c = pcp.Compound.from_cid(5090)\n",
    "c"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we have a `Compound` object called `c`. We can get all the information we need from this object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "C17H14O4S\n"
     ]
    }
   ],
   "source": [
    "print(c.molecular_formula)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "314.35566\n"
     ]
    }
   ],
   "source": [
    "print(c.molecular_weight)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CS(=O)(=O)C1=CC=C(C=C1)C2=C(C(=O)OC2)C3=CC=CC=C3\n"
     ]
    }
   ],
   "source": [
    "print(c.isomeric_smiles)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.3\n"
     ]
    }
   ],
   "source": [
    "print(c.xlogp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3-(4-methylsulfonylphenyl)-4-phenyl-2H-furan-5-one\n"
     ]
    }
   ],
   "source": [
    "print(c.iupac_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[u'rofecoxib', u'Vioxx', u'Ceoxx', u'162011-90-7', u'MK 966', u'MK-966', u'4-[4-(methylsulfonyl)phenyl]-3-phenylfuran-2(5H)-one', u'MK-0966', u'Vioxx (trademark)', u'MK 0966', u'CCRIS 8967', u'CHEBI:8887', u'Vioxx (TN)', u'HSDB 7262', u'Spectrum_000119', u'SpecPlus_000669', u'Spectrum2_000446', u'Spectrum3_001153', u'Spectrum4_000631', u'Spectrum5_001598', u'UNII-0QTW8Z7MCR', u'MK 996', u'MK0966', u'CHEMBL122', u'AC1L1JL6', u'KS-1107', u'3-(4-methylsulfonylphenyl)-4-phenyl-2H-furan-5-one', u'4-(4-methylsulfonylphenyl)-3-phenyl-5H-furan-2-one', u'NCGC00095118-01', u'BSPBio_002705', u'KBioGR_001242', u'KBioGR_002345', u'KBioSS_000559', u'KBioSS_002348', u'Rofecoxib (JAN/USAN/INN)', u'BIDD:GT0399', u'Bio-0094', u'DivK1c_006765', u'4-[4-(methylsulfonyl)phenyl]-3-phenyl-2(5H)-furanone', u'SPBio_000492', u'SPECTRUM1504235', u'MLS000759440', u'MLS001165770', u'MLS001195623', u'MLS001424113', u'Jsp003237', u'C17H14O4S', u'KBio1_001709', u'KBio2_000559', u'KBio2_002345', u'KBio2_003127', u'KBio2_004913', u'KBio2_005695', u'KBio2_007481', u'KBio3_002205', u'KBio3_002825', u'cMAP_000024', u'MolPort-000-883-878', u'MolPort-006-817-786', u'HMS1922H11', u'HMS2051G16', u'HMS2089H20', u'HMS2093E04', u'NSC720256', u'STK635144', u'ZINC00007455', u'AKOS000280931', u'4-(4-(Methylsulfonyl)phenyl)-3-phenyl-2(5H)-furanone', u'4-(p-(Methylsulfonyl)phenyl)-3-phenyl-2(5H)-furanone', u'DB00533', u'MK 0996', u'2(5H)-Furanone, 4-(4-(methylsulfonyl)phenyl)-3-phenyl-', u'3-Phenyl-4-(4-(methylsulfonyl)phenyl))-2(5H)-furanone', u'NCGC00095118-02', u'NCGC00095118-03', u'NCGC00095118-04', u'AC-13144', u'CPD000466331', u'LS-70511', u'NCI60_041175', u'SAM001246617', u'SMR000466331', u'FT-0081390', u'C07590', u'D00568', u'L000912', u'186912-82-3', u'BRD-K21733600-001-02-6', u'I01-1042', u'3-phenyl-4-[4-(methylsulfonyl)phenyl]-2(5H)-furanone', u'4-(4-(Methylsulfonyl)phenyl)-3-phenylfuran-2(5H)-one', u'refecoxib', u'2(5H)-Furanone, 4-[4-(methyl-sulfonyl)phenyl]-3-phenyl-', u'Vioxx Dolor', u'MSD brand of rofecoxib', u'Merck brand of rofecoxib', u'2(5H)-Furanone, 4-[4-(methylsulfonyl)phenyl]-3-phenyl-', u'Merck Frosst brand of rofecoxib', u'CID5090', u'Rofecoxib (Vioxx)', u'Rofecoxib [USAN]', u'Cahill May Roberts brand of rofecoxib', u'Merck Sharp & Dhome brand of rofecoxib', u'PubChem15028', u'SureCN3050', u'DSSTox_CID_3567', u'AGN-PC-00E0TK', u'DSSTox_RID_77084', u'C116926', u'DSSTox_GSID_23567', u'Rofecoxib [USAN:INN:BAN]', u'GTPL2893', u'HMS2232G21', u'HMS3371P11', u'HMS3393G16', u'MK966', u'Pharmakon1600-01504235', u'Tox21_111430', u'ANW-71936', u'CCG-40253', u'DAP001338', u'NSC758705', u'AB07701', u'BD41342', u'CS-0997', u'MCULE-4806636118', u'NC00132', u'NSC-720256', u'NSC-758705', u'NCGC00095118-05', u'AK-60971', u'HY-17372', u'CAS-162011-90-7', u'FT-0631192', u'K-5064', u'AB00052090-06', u'AB00052090-08', u'A810324', u'3B2-0954', u'Rofecoxib|162011-90-7|Vioxx|MK966|MK-966', u'4-(4-METHANESULFONYL-PHENYL)-3-PHENYL-5H-FURAN-2-ONE', u'4-(4-methanesulfonylphenyl)-3-phenyl-2,5-dihydrofuran-2-one']\n"
     ]
    }
   ],
   "source": [
    "print(c.synonyms)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Searching\n",
    "\n",
    "What if you don’t know the PubChem CID of the Compound you want? Just use the `get_compounds()` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Compound(5793), Compound(79025), Compound(64689), Compound(206)]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results = pcp.get_compounds('Glucose', 'name')\n",
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first argument is the identifier, and the second argument is the identifier type, which must be one of name, smiles, sdf, inchi, inchikey or formula. It looks like there are 4 compounds in the PubChem Database that have the name Glucose associated with them. Let’s take a look at them in more detail:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O\n",
      "C([C@@H]1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O)O\n",
      "C([C@@H]1[C@H]([C@@H]([C@H]([C@@H](O1)O)O)O)O)O\n",
      "C(C1C(C(C(C(O1)O)O)O)O)O\n"
     ]
    }
   ],
   "source": [
    "for compound in results:\n",
    "    print compound.isomeric_smiles"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It looks like they all have different stereochemistry information.\n",
    "\n",
    "Retrieving the record for a SMILES string is just as easy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Compound(1318)]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pcp.get_compounds('C1=CC2=C(C3=C(C=CC=N3)C=C2)N=C1', 'smiles')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's worth being aware that line notation inputs like SMILES and InChI can return automatically generated records that aren’t actually present in PubChem, and therefore have no CID and are missing many properties that are too complicated to calculate on the fly."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "Previous: [Introduction](1-introduction.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
